47 research outputs found
Meet Charles, big data query advisor
In scientific data management and business analytics, the most informative queries are a holy grail. Data collection becomes increasingly simpler, yet data exploration gets significantly harder. Exploratory querying is likely to return an empty or an overwhelming result set. On the other hand, data mining algorithms require extensive preparation, ample time and do not scale well. In this paper, we address this challenge at its core, i.e., how to query the query space associated with a given database. The space considered is formed by conjunctive predicates. To express them, we introduce the Segmentation Description Language (SDL). The user provides a query. Charles, our query advisory system, breaks its extent into meaningful segments and returns the subsequent SDL descriptions. This provides insight into the set described and offers the user directions for further exploration. We introduce a novel algorithm to generate SDL answers. We evaluate them using four orthogonal criteria: homogeneity, simplicity, breadth, and entropy. A prototype implementation has been constructed and the landscape of follow-up research is sketched
X-Device Query Processing by Bitwise Distribution
The diversity of hardware components within a single system calls for strategies for efficient cross-device data processing. For exam- ple, existing approaches to CPU/GPU co-processing distribute individual relational operators to the “most appropriate” device. While pleasantly simple, this strategy has a number of problems: it may leave the “inappropriate” devices idle while overloading the “appropriate” device and putting a high pressure on the PCI bus. To address these issues we distribute data among the devices by par- tially decomposing relations at the granularity of individual bits. Each of the resulting bit-partitions is stored and processed on one of the available devices. Using this strategy, we implemented a processor for spatial range queries that makes efficient use of all available devices. The performance gains achieved indicate that bitwise distribution makes a good cross-device processing strategy
Blaeu: Mapping and navigating large tables with cluster analysis
Blaeu is an interactive database exploration tool. Its aim is to guide casual users through large data tables, ultimately triggering insights and serendipity. To do so, it relies on a double cluster analysis mechanism. It clusters the data vertically: it detects themes, groups of mutually dependent columns that highlight one aspect of the data. Then it clusters the data horizontally. For each theme, it produces a data map, an interactive visualization of the clusters in the table. The data maps summarize the data. They provide a visual synopsis of the clusters, as well as facilities to inspect their content and annotate them. But they also let the users navigate further. Our explorers can change the active set of columns or drill down into the clusters to refine their selection. Our prototype is fully operational, ready to deliver insights from complex databases
Scalable Generation of Synthetic GPS Traces with Real-life Data Characteristics
Database benchmarking is most valuable if real-life data and workloads are
available. However, real-life data (and workloads) are often not publicly
available due to IPR constraints or privacy concerns. And even if
available, they are often limited regarding scalability and variability of
data characteristics. On the oth
Combining design and performance in a data visualization management system
Interactive data visualizations have emerged as a prominent way to bring data exploration and analysis capabilities to both technical and non-technical users. Despite their ubiquity and importance across applications, multiple design- and performance-related challenges lurk beneath the visualization creation process. To meet these challenges, application designers either use visualization systems (e.g., Endeca, Tableau, and Splunk) that are tailored to domain-specific analyses, or manually design, implement, and optimize their own solutions. Unfortunately, both approaches typically slow down the creation process. In this paper, we describe the status of our progress towards an end-to-end relational approach in our data visualization management system (DVMS). We introduce DeVIL, a SQL-like language to express static as well as interactive visualizations as database views that combine user inpu
Have a chat with Clustine, conversational engine to query large tables
Thanks the recent advances of AI and the stellar popularity of messaging apps (e.g., WhatsApp), chatbots are no longer bound to customer support services and computer museums. Indeed, they provide a mighty, lightweight and accessible way to provide services over the Internet. In this paper, we introduce Clustine, a chatbot to help users query large tables through short messages. The main idea is to combine cluster analysis and text generation to compress query results, describe them with natural language and make recommendations. We present the architecture of our system, demonstrate it with two use cases, and present early validation experiments with 12 real datasets to show that its promises are reachable
Combining design and performance in a data visualization management system
Interactive data visualizations have emerged as a prominent way to bring data exploration and analysis capabilities to both technical and non-technical users. Despite their ubiquity and importance across applications, multiple design- and performance-related challenges lurk beneath the visualization creation process. To meet these challenges, application designers either use visualization systems (e.g., Endeca, Tableau, and Splunk) that are tailored to domain-specific analyses, or manually design, implement, and optimize their own solutions. Unfortunately, both approaches typically slow down the creation process. In this paper, we describe the status of our progress towards an end-to-end relational approach in our data visualization management system (DVMS). We introduce DeVIL, a SQL-like language to express static as well as interactive visualizations as database views that combine user inputs modeled as event streams and database relations, and we show that DeVIL can express a range of interaction techniques across several taxonomies of interactions. We then describe how this relational lens enables a number of new functionalities and system design directions and highlight several of these directions. These include (a) the use of provenance queries to express and optimize interactions, (b) the application of concurrency control ideas to interactions, (c) a streaming framework to improve near-interactive visualizations, and (d) techniques to synthesize interactive interfaces tailored to end-users